Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement partition compaction grouper #6172

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

alexqyle
Copy link
Contributor

@alexqyle alexqyle commented Aug 20, 2024

What this PR does:

This PR implements partition compaction grouper.

Introduced new files for partition compaction:

  • partitioned_group_info: This file acts like a compaction plan. It contains the information that how source blocks from compaction time range being assigned to partitions for compaction. partitionedGroupID in the file is unique for particular time range.
  • partition_visit_marker: Visit marker file for each partition under compaction. This could prevent multiple compactors from working on the same partition compaction. Similar to block visit marker.

Here is high level algorithm of partition compaction grouper:

  1. Group blocks by time range
  2. Load existing partitioned_group_info files
  3. Gathering information of each time range and check which time range where grouper can take compaction job from
  4. Create partitioned groups from grouped blocks
  5. Sanitize partitions from each partitioned group
  6. Return ready to compact partitioned groups to Thanos for compaction

Introduced meta_extensions to save partition information of result block in meta.json. This infomation can be used to better assign block to proper partition in the next round of compaction.

Which issue(s) this PR fixes:
NA

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@danielblando
Copy link
Contributor

danielblando commented Dec 20, 2024

Overall looking good to me. Just a few comments.

For user to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

@alexqyle
Copy link
Contributor Author

Overall looking good to me. Just a few comments.

For customer to migrate. Would it just work changing the configuration and deploying? I would imagine yes as we would not find partition data to any block and treat them all as partitionId 0. Correct?

Yes. Switching back and forth between partitioning and non partitioning should not cause any issue. At most, the largest time range block would be recompacted one more time.

@danielblando
Copy link
Contributor

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

@alexqyle
Copy link
Contributor Author

How it works while deployment is happening? Because we can have compactors creating blocks with partition and compactors creating others without and they are seeing different visit markers? Would it create duplicate compaction while deployment is happening?

If both are compacting the largest time range blocks, it would create duplicate blocks. For any lower level blocks, it would be compacted into higher level properly after deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants